The PC-SIG Library 9

home *** CD-ROM | disk | FTP | other *** search

/ The PC-SIG Library 9 / The PC-SIG Library on CD ROM - Ninth Edition.iso / 001_100 / DISK0088 / DISK0088.ZIP / PRINTDOC < prev next >

Wrap

Text File | 1987-05-04 | 50KB | 1,037 lines

1 EPISTAT Statistical Package for the IBM Personal Computer Version 3.3 Written by: Tracy L. Gustafson, M.D. Copyright 1986 2 INTRODUCTION EPISTAT is a collection of programs written in BASICA for statistical analysis of small to medium-sized data samples ( < 28 samples or variables and < 2000 total data entries per file). The 25 programs in EPISTAT perform more than 40 common statistical tests or functions and provide utilities for data entry, editing, printing, graphing, sorting, selecting, transforming and crosstabs. The programs are intended to be as self-explanatory and user- friendly as possible. You do not need to memorize this guide before using the programs. On the other hand, neither the programs nor this manual purport to TEACH the proper use or interpretation of statistics. The user must have some familiarity with the kinds of data required and the underlying assumptions appropriate to each statistical test. For further explanations of tests, refer to: 1. Colton, Theodore. Statistics in Medicine. Little, Brown and Co. Boston, 1974. 2. Fleiss, Joseph. Statistical Methods for Rates and Proportions. John Wiley and Sons. New York, 1981. 3. Rosner, Bernard. Fundamentals of Biostatistics. Prindle Weber and Schmidt. Boston, 1982. 4. Snedecor, George W. and Cochran, William G. Statistical Methods. Iowa State Univ. Press. Ames, Iowa, 1978. 5. Schlesselman, James. Case-Control Studies. Oxford Univ. Press. New York, 1982. 6. Zar, Jerrold. Biostatistical Analysis. Prentice-Hall. Englewood Cliffs, New Jersey. 1984. CAVEAT: These programs have been tested extensively, but I cannot guarantee that they will work correctly with every possible data set. Incorrect results are usually due to errors in format or type of data entered. If you believe you have discovered an error in the programs, please write me. I intend to correct any bugs that are brought to my attention. It is good practice to regularly compare the results obtained by programs in EPISTAT with results obtained by your previous method of calculation. ANY unexpected result should be questioned and double-checked by reference to tables or another method of calculation. 3 INDEX TO EPISTAT The following statistical tests and functions are available: TEST or FUNCTION PROGRAM NAME ---------------- ------------ Analysis of variance (1 and 2-way)...................ANOVA Bayes' theorem.......................................BAYES Binomial distribution................................BINOMIAL Chi-square test and distribvtion.....................CHISQR Correlation coefficients.............................CORRELAT F distribution.......................................ANOVA Fisher's exact test..................................FISHERS Linear regression analysis...........................LNREGRES Mantel-Haenszel Chi-square test......................MHCHISQR Mantel-Haenszel for multiple controls................MHCHIMLT McNemar's test.......................................MCNEMAR Mean, median and standard deviation..................DATA-ONE Normal distribution..................................NORMAL Poisson distribution.................................POISSON Random sample generator..............................RANDOMIZ Rank sum test........................................RANKTEST Rates adjusted (direct and indirect).................RATEADJ Sample size calculations..........,..................SAMPLSIZ Signed rank test.....................................RANKTEST Student's T-test and T distribution..................T-TEST The following data-handling capabilities are provided: DATA MANIPULATION PROGRAM NAME ----------------- ------------ Determine best test and program names................EPISTAT Graph histograms.....................................HISTOGRM Graph scattergrams...................................SCATRGRM Perform data transformations.........................LNREGRES Print data (sorted or input order)...................DATA-ONE Print crosstab reports...............................XTAB Select specific records..............................SELECT Transfer data between EPISTAT files..................FILETRAN Transfer data from FORTRAN to EPISTAT files..........FORTRANS 4 SYSTEM REQUIREMENTS FOR EPISTAT MINIMUM OPTIMAL IBM PC with 64K RAM IBM PC with 96K RAM One 160K disk drive Two 320K disk drives Monochrome monitor Color graphics adapter BASICA Hi-res color monitor BASICA IBM, Epson, Okidata, or C. Itoh Prowriter printer with graphics capability OVERALL PROGRAM DESCRIPTION All calculations in EPISTAT are performed using single precision. Although it may first appear that double precision would be more appropriate for statistical tests, "double" precision makes little or no real improvement in the accuracy of these programs. For best results, data entries should be numbers between 1E+7 and 1E-7. Larger or smaller numbers should be multiplied by an appropriate power of 10 before entry and analysis in EPISTAT. All EPISTAT programs are written so that as much pertinent information about the test as possible can fit on the final screen. This feature allows a summary printed copy to be produced simply by pressing <Shift-PrtSc>. This will work any time there is a pause in the program display. Six programs, "DATA-ONE", "HISTOGRM", "RANDOMIZ", "SCATRGRM", "SELECT", and "XTAB" produce printed reports without using <Shift-PrtSc>. In these, follow program instructions to route output to your printer. EPISTAT is the introductory program in the EPISTAT package. DATA-ONE is the major data entry, editing, and printing program. Most of the programs in EPISTAT can evaluate data entered and saved using DATA-ONE. Many of the programs can, in addition, evaluate summary data. The programs marked with a star (*) below can evaluate data entered in DATA-ONE. Non-starred programs provide their own data entry routines. The EPISTAT disk should be placed in drive A (or other default drive) when loading any program because "EPIMRG" and "EPISETUP.DAT" are used by every program. Once a program is running, EPISTAT can be removed from drive A if necessary. 5 INDIVIDUAL PROGRAM DESCRIPTIONS (1) "EPISTAT" This introductory program lists the available programs. It also aids the user in selecting the best statistical test. To do so, choose menu option 2 and decide whether you are interested in tests for a single sample, tests for 2 or more samples, other statistical functions, or data handling utilities. You are also allowed to specify hardware configuration and colors for a color monitor. Choose colors 7,0,0 if you have a monochrome monitor connected to the color/graphics adapter. If yours is not one of the listed printers, check your printer's codes for the typeface you want. For example, the code for elite type on the Prowriter is ESC "E". If you press Escape then E, the display will show the decimal ASCII codes: 27 69. An alternate method is to press <Alt> and enter the decimal code on the numeric keypad. Press <Enter> when the complete code is entered. "DATA-ONE" * A. DATA ENTRY: This is the central keyboard data entry program for the EPISTAT package (for non-keyboard data entry, see FILETRAN and FORTRANS). Initial data entry (Option 1) first asks you to name your samples or variables. Then type in the data, pressing <Enter> after each entry. Press the TAB key to back up one or two items on the SAME ROW. The maximum number of samples or variables (S) allowed is 28 with a color adapter and 7 with a monochrome adapter. The maximum number of records in each sample is 2000/S. A missing value can be entered by pressing <Enter> only. Note that this is different than entering a zero (0). To exit, press key F10. The mean, median and (n-1) standard deviation are then displayed. When you return to the main menu, SAVE your datafile to disk (Option 5) for future modification or use by other programs in the EPISTAT package. Although all entries in a datafile are treated as numbers by DATA-ONE, it is possible to enter characters (names) in a record. Characters will be treated as zeros in calculations. Nevertheless, it improves data readability to use the "Sample 1" column for record or case names. Thus, DATA-ONE allows one to specify a name for each column (variable) and each row (case) in the datafile. B. DATA MODIFICATION: APPEND (Option 2) allows one to add more observations to a sample at a later session. EDIT (Option 3) allows one to delete or replace incorrect data entries and to change sample or variable names. When you return to the main menu, SAVE modified data to disk again. 6 C. PRINTING DATA: To view or review a datafile, a printout to screen or printer can be selected (Option 4). To print a datafile exactly as it was keyed in, request the printout in INPUT order. DATA-ONE can also print the data SORTED by any selected sample. Only numeric data is sorted by DATA-ONE, so it will not alphabetize a character field. Blank records are not sorted, either. D. SAVING DATAFILES and LOADING DATAFILES: SAVING data (Option 5), writes your data to disk in a sequential file for later editing, review, or use by another program. DATA MUST BE SAVED TO DISK before it can be used by other programs in EPISTAT. Since EPISTAT must be in drive A: (or other default drive) to begin, you will probably want to SAVE datafiles on drive B. To do so, precede each datafile name with B: (e.g. B:TESTDATA). Do not enclose filenames in quotation marks. (3) "ANOVA" * A. ONE-way ANOVA: PURPOSE: To compare the means of 3 or more samples. DATA REQUIRED: A DATA-ONE datafile with 3 or more columns/variables. EXAMPLE: Are the mean ages of three groups of individuals significantly different? COMMENT: Sample means, (n-1) variances, the mean variance and the variance of the means are displayed. Total sum of squares, Treatment sum of squares and Error sum of squares are also shown. Finally the F value, degrees of freedom (df) in the numerator and df in the denominator and p value are given. B. TWO-way ANOVA: PURPOSE: To evaluate the combined effects of 2 variables on a third variable (ROW and COLUMN effects). DATA REQUIRED: A DATA-ONE datafile with at least 2 columns and 2 rows. EXAMPLE: How much of the variance in transparency of glass types is attributable to the kind of sand and how much to the process used to make it? COMMENT: All samples in two-way ANOVA must have the same number of elements. Sample means, (n-1) variances, Total sum of squares, Row sum of squares, Column sum of squares and Residual are all displayed. The F value, df in numerator, df in denominator and corresponding p values are shown for both the Row and Column effects. C. F-value: PURPOSE: To evaluate the p value associated with a known F value. DATA REQUIRED: F value, df in numerator, and df in denominator. REFERENCE: Snedecor, pp. 258-338. 7 (4) "BAYES" A. Probabilities of false positive and false negative tests: PURPOSE: To evaluate a test or procedure in terms of its sensitivity and specificity. DATA REQUIRED: Sensitivity and specificity of a test in relation to a specific condition it tests for. The estimated incidence of this condition in the population being tested. EXAMPLE: If a test has a specificity of .99 and a sensitivity of .99, how many false positives will occur in a population where the incidence of this disease is only 100/10,100 ? Answer: 99% of positives will be false positives. B. Probability of disease given a positive test: PURPOSE: To determine the most likely disease given a certain positive test. DATA REQUIRED: The estimated incidence of several diseases in the test population. (Use `OTHER' as the last disease so that the sum of all percentages is 100). The probability of a positive test in people known to have each disease (test sensitivities). EXAMPLE: If antithyroid antibodies are found in patients with diabetes, thyroiditis and other diseases, what is the a priori probability of each diagnosis given a positive test? This will vary as the relative incidence of these diseases varies in the test population. COMMENT: Although the examples deal with the use of medical tests, the same statistical test applies to the relation of any test for any condition. REFERENCE: Fleiss, p. 5. (5) "BINOMIAL" PURPOSE: The binomial distribution allows calculation of the probability of an observed number compared to a known expected. DATA REQUIRED: A dichotomous variable that has an equal probability of occurring in each of N trials. EXAMPLE: What is the chance of obtaining 2 or fewer heads in 10 tosses of a fair coin? Answer: p = .055 COMMENT: BINOMIAL calculates the ONE-tailed probability of the observed number and all more extreme situations. For example the ONE-tailed probability of 2 heads in 10 tosses of a coin is the sum of the probabilities for 0,1 and 2 heads. REFERENCE: Colton, p. 151. 8 (6) "CHISQR" A. Table of data: PURPOSE: The Chi-square program evaluates a possible relationship between the row variable and the column variable. DATA REQUIRED: The counts for each cell of the table. EXAMPLE: Is there a relationship between race and socioeconomic group? COMMENT: 2 by 2 tables are evaluated using Yates' correction and the odds ratio and its confidence limits are calculated using Cornfield's method. B. Chi-square value: PURPOSE: To evaluate the p value associated with a known X-square value. DATA REQUIRED: The chi-square value and the degrees of freedom. C. Chi-square test for trend: PURPOSE: To evaluate a possible directional relationship between the row variable and the column variable. If the row is exposure level and the column is outcome, the relationship is called a `dose-response.' DATA REQUIRED: A number that describes each `exposure level'. (If they are not quantifiable, just use consecutive numbers.) The number of cases and controls at each exposure level. EXAMPLE: Is the risk of lung cancer directionally related to the number of pack-years of smoking? REFERENCE: Schlesselman, p. 175,177. (7) "CORRELAT" * A. Pearson's correlation coefficient: PURPOSE: To assess the linear relationship between two variables. DATA REQUIRED: A DATA-ONE datafile containing the two samples/variables of interest. EXAMPLE: How closely do age and blood pressure correlate? COMMENT: The correlation coefficient is calculated and then tested using the Student's T distribution for the probability that such a correlation would occur by chance. B. R value: PURPOSE: To evaluate the p value associated with a known R value. DATA REQUIRED: The R value and the number of observations in the sample from which it came. C. Spearman's rank correlation: PURPOSE: To assess the relationship between two variables that are not normally distributed (and only a small sample is available). DATA REQUIRED: A DATA-ONE datafile containing the 2 variables of interest. EXAMPLE: How closely do infant's ages at death correlate with birthweight? COMMENT: The correlation coefficient is calculated but associated p values are not calculated. REFERENCE: Colton, p. 212. 9 (8) "FILETRAN" * PURPOSE: To transfer a sample or column of data from one EPISTAT datafile to another. This makes it unneccesary to re-enter data, even if you need to compare 2 samples that are in separate datafiles, or you have a data set with more than 28 variables that you split between two or more datafiles. You may create a new datafile by selecting one sample from DATAFILE #1 and another from DATAFILE #2. FILETRAN can also combine two samples by APPENDING one to the other. DATA REQUIRED: Two DATA-ONE datafiles. First enter the datafile you with to replace, add or append a sample TO. Then enter the datafile you wish to transfer data FROM. After the data sample has been added, you may save the data under the original filename, or create a new datafile with the additional data in it. You may also cancel the file modification if you find you have made an error. EXAMPLE: You performed the same experiment on two different days and analyzed the results separately. Now you want to combine the results of both experiments and analyze the combined data set. FILETRAN will allow you to append the two files together and save that data under a new filename. COMMENT: If you want to append several columns of data from one · datafile to another, do not return to the main menu until all columns have been appended. Exiting between appending will leave large blank spaces in the file. (9) "FISHERS" PURPOSE: Fisher's exact test evaluates 2 by 2 tables of discrete variables. DATA REQUIRED: The counts for each of 4 cells of the table. EXAMPLE: Is there a relationship between being bald and dying of coronary heart disease? COMMENT: Fisher's exact test is particularly valuable when the Chi-square test is inappropriate because the expected value for a cell is less than 5. However, this program can evaluate some tables where A+B+C+D > 200. 10 (10) "FORTRANS" PURPOSE: To transfer data from an SDF, FORTRAN, or sequential card image file into EPISTAT DATA-ONE format. DATA REQUIRED: A sequential card image file of equal-length records each delimited by a carriage return and line feed. The end of file must be marked by a CHR(26). You must know the record length (including spaces, but NOT including the carriage return and line feed at the end of each line), the beginning column number and width of each data item you want to transfer. If your datafile contains understood (but not marked) decimal places, then enter the number of decimal places. If your datafile contains marked decimal places, then enter 0 for (understood) decimal places. Finally, specify a missing value code like 9999. If you have no missing values, then enter a code that does not occur in your data set. EXAMPLE: You have a FORTRAN file on the mainframe with 10 years worth of data. You can select a subset of that data from a 6-month period and read that into EPISTAT for some pilot analyses before using mainframe time to analyze the entire data set. COMMENT: FORTRANS can be used to extract selected data items from DBASE(R) "SDF" type files and from LOTUS(R) "PRN" print files. Be sure to first look at the datafile you create from DBASE or LOTUS with your word processor in non-document mode to be sure that all records are of equal length and that you know which columns contain which data items. Some programs add extra spaces here and there when creating an SDF file. FORTRANS will not successfully read a datafile with more that 255 columns of data in each record. (11) "HISTOGRM" * PURPOSE: To graph a data sample according to user specifications in the form of a histogram on the high resolution graphics screen. DATA REQUIRED: A DATA-ONE datafile. The full name of the variable to be graphed, its units, and the width of each cell in the histogram. EXAMPLE: What is the distribution of scores on the last exam? COMMENT: You determine the appearance of the report by entering a label for the horizontal axis and the interval width. To obtain a printed copy on the IBM, Epson, Okidata or Prowriter printer (specified in "EPISTAT" when you setup) press key F1. Press F10 to return to the program. 11 (12) "LNREGRES" * A. Linear regression: PURPOSE: To calculate the least-squares regression line for paired samples. DATA REQUIRED: A DATA-ONE datafile and the sample numbers of the predictor and dependent variables. EXAMPLE: What is the regression line relating IQ to income? COMMENT: The regression line is displayed in the form Y = b + aX. The T distribution is applied to determine if the calculated slope is significantly different than zero. The T value, degrees of freedom and p value are shown. REFERENCE: Colton p. 199. B. Data transformations: PURPOSE: To change a data set in a regular way, either to normalize it or to identify a non-linear relationship between two variables. DATA REQUIRED: A DATA-ONE datafile with fewer than 28 variables in it. EXAMPLE: In my sample, IQ and income were not linearly related, so I will try a transformation to see if they are related logarithmically. COMMENT: Nine transformations are available: 1. Ax + B 6. A * ln(x) + B 2. A(x)squared + B 7. ln(x/(100-x)) 3. A*square root(x) + B 8. Sample A + Sample B 4. A/x + B 9. Sample A * Sample B 5. x - mean Specify the value for A and B and the program will apply that formula to each value in the sample you want transformed. It then adds this transformed sample to the datafile as an additional column/variable. You may save the new datafile containing this transformed variable under the old name or under a new datafile name as you choose. (13) "MHCHISQR" PURPOSE: To evaluate the relationship between two discrete variables while controlling for the effect of a third variable. DATA REQUIRED: The names of the factors you wish to test for and control for as well as the counts of cases and controls that have and do not have the test and control variables. This is the equivalent of a series of 2 by 2 tables, one for each category of the control variable. EXAMPLE: Is there a relationship between smoking and lung cancer, controlled for occupation? COMMENT: The factor you are testing must be dichotomous, but the control variable may have more that 2 categories. The Chi-square value, degrees of freedom, and p value are displayed. Also shown are an odds ratio and 95% confidence limits on the odds ratio. REFERENCE: Schlesselman, pp. 183,206. 12 (14) "MHCHIMLT" * PURPOSE: To evaluate the relationship between cases and controls and a test factor when each a case is matched with 2 or more controls. DATA REQUIRED: A DATA-ONE datafile or manually entered summary data. If using DATA-ONE, a case sample and a 2 or more control samples should be present. Data is coded as "1" for factor present, and "0" for factor absent in each case and control sample. EXAMPLE: Is there a relationship between illness and eating raw potatoes? COMMENT: The Chi-square value, degrees of freedom and p value are displayed. Also shown are an odds ratio and 95% confidence limits on the odds ratio. This test does not apply if each case is matched with a different number of controls. REFERENCE: Fleiss, p. 125. (15) "MCNEMAR" PURPOSE: Also called a paired Chi-square test, McNemar's test evaluates a relationship between two variables by analyzing the number of discordant PAIRS. DATA REQUIRED: The name of the factor being tested in CASES and CONTROLS and the number of pairs that belong in each of 4 cells. EXAMPLE: In twins in which one developed a stroke and the other did not, is there a relationship between high-fat diet and stroke? COMMENT: The Chi-square value is calculated using Yates correction, and degrees of freedom and p value are displayed. Also shown are an odds ratio and 95% confidence limits on the odds ratio. REFERENCE: Schlesselman, p. 210. (16) "NORMAL" * A. Comparing a sample mean to the population mean: PURPOSE: To see if your sample mean is different from a known population. DATA REQUIRED: A DATA-ONE datafile and a known population mean. EXAMPLE: Is the mean blood pressure in my sample statistically different from the U.S. population mean? COMMENT: The mean for the sample and the p value are displayed. B. Percent of test values in a given range: PURPOSE: To determine the percent of sample values that will fall between two values in a normally distributed population. DATA REQUIRED: The mean and standard deviation of the population being sampled. The upper and lower limits of the range in question. EXAMPLE: If the population mean height is 70 inches and the standard deviation is 3 inches, what proportion of the population are at least 65 inches but no more than 73 inches tall? Answer: 79.4 % of the population. C. Z value: PURPOSE: To evaluate the p value associated with a known Z value. DATA REQUIRED: The known Z value. COMMENT: A two-tailed p value is returned. 13 (17) "POISSON" PURPOSE: To determine the probability of a certain number of cases or events, when the expected rate is known but the number of times when the case or event did not occur cannot be counted. DATA REQUIRED: The number of cases observed and the expected number of cases (calculated as expected rate * time interval). EXAMPLE: Is it unusual for lightning to strike 5 people in one county this year, given that in the last 5 years lightning has struck only 8 people in this county? Answer: p = .024 COMMENT: The ONE-tailed probability of observing the given number AND all more extreme cases is displayed. (18) "RANDOMIZ" A. Survey sample: PURPOSE: To provide a series of random numbers to aid in selecting a survey sample from a large number of possible respondents. DATA REQUIRED: The smallest number and the largest number you want, and the number of random numbers between those values you want selected. EXAMPLE: I want to survey 100 individuals from the pages of the telephone book. The telephone book has 700 pages so I will ask for 100 numbers between 1 and 700 and then phone the tenth person on each of the randomly selected pages. B. Unpaired case-control sample: PURPOSE: To assign subjects to two equal groups randomly. DATA REQUIRED: The total number of subjects in the study. EXAMPLE: Assign 50 patients to receive drug A and 50 to receive drug B. COMMENT: You are also asked if subjects will enter the study over a period longer than one month. If so, you are warned that in many studies it is preferable to randomize each month's cases independently, so that seasonal biases do no creep in. C. Paired case-control sample: PURPOSE: To assign members of pairs to case and control groups randomly. DATA REQUIRED: The total number of pairs. You must also decide on an objective way of deciding which one of each pair is #1 and which is #2. EXAMPLE: Assign 20 pairs of patients to case and control groups randomly. COMMENT: Consecutive order of patients admitted to the hospital is not always a satisfactory method of deciding which of each is #1 and which is #2. Alphabetic criteria, day of week, or other criteria entirely beyond the investigator's control are usually better. REFERENCE: Colton, p.259. 14 (19) "RANKTEST" * A. Rank sum test: PURPOSE: To evaluate the difference between two unpaired non-parametric samples. Comparable to the unpaired T-test for normally distributed samples. It also specifically applies when quantitative variables are not available but qualitative ranks are. DATA REQUIRED: A DATA-ONE datafile or the number of observations in each of two samples and the sum of ranks for the first sample. EXAMPLE: Is the duration of remission different for leukemia patients treated with regimen #1 compared regimen #2? Duration of remission is measured in months and 8 cases and 10 controls have been followed for 5 years. COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are displayed for both groups. The two-tailed exact p value is then calculated. For large samples ( N1+N2 > 24 ), the normal approximation is used to calculate probabilities. Note that even non-parametric samples larger than 30 can often be evaluated with parametric tests like the T-test (the central limit theorem). B. Signed rank test: PURPOSE: To evaluate the difference between two paired non-parametric samples. Comparable to the paired T-test for normally distributed samples. It also specifically applies when quantitative variables are not available but qualitative ranks are. DATA REQUIRED: A DATA-ONE datafile or the number of non-zero differences ranked and the sum of negative and then sum of positive-signed ranks. EXAMPLE: For paired rats from the same litter, does extra dietary vitamin E shorten the time it takes to complete a maze? COMMENT: If a DATA-ONE file is used, the medians and sums of ranks are displayed for both groups. The two-tailed exact p value is then calculated. However, for large samples ( N > 20 ), the normal approximation is used to calculate probabilities. REFERENCE: Colton, pp. 219-222. 15 (20) "RATEADJ" * A. Direct rate adjustment: PURPOSE: To adjust a rate to a standard population for comparison to other published rates. DATA REQUIRED: A DATA-ONE datafile that includes one sample containing the study rates to by adjusted (e.g. the rate in each age group if age-adjusting). A second sample must contain the standard population counts for the same groups. Rates in the first sample may use any denominator (per 1000, per million, etc), as you supply that denominator at the time of the calculation. EXAMPLE: Studying bladder cancer in Eskimos, you want to age-adjust to the standard U.S. population to compare to other studies. COMMENT: Direct adjustment may not be appropriate if the number of cases in any one cell is fewer than 5. B. Indirect rate adjustment: PURPOSE: To adjust sample observations to to a standard population rate for comparison to other published rates. DATA REQUIRED: A DATA-ONE datafile that includes one sample containing the number of cases observed in the study. A second sample must contain the standard population rates for the same groups. The standard population rates may use any denominator (per 1000, per million, etc), as you supply that denominator at the time of the calculation. EXAMPLE: Studying bladder cancer in Eskimos, you find only 2 or 3 cases in several of the younger age groups. You want to age-adjust to standard U.S. population rates to compare to other studies. COMMENT: In addition to age-adjusting, RATEADJ will calculate the probability of observing the number of cases (total) that you observed in your study. Enter the number observed and the Expected number will be displayed as well as the one-tailed POISSON probability of this outcome. The adjusted rate is displayed in the form: ` X times the standard population rate.' REFERENCE: Colton, pp. 47-51. 16 (21) "SAMPLSIZ" A. Survey sample size: PURPOSE: To determine the sample size required to for a survey sample. DATA REQUIRED: The approximate size of the population from which you plan to draw the sample, your estimate of the rate of the study characteristic (the result of your study), the accuracy you require, and the z(alpha) level you wish to test. EXAMPLE: What sample size is required to determine the immunization levels in 2 year olds within 1% of the true value, given that there are 100,000 2 year-olds in the state, and we believe that 95% are immunized? Let z(alpha) correspond to 95% certainty. Answer: N = 1792 COMMENT: TP = total population pi = population proportion d = maximum acceptable error in sample proportion n = [ z(a)*SQR(pi*(1-pi)) / d ] squared and N = n / (1+n/TP) B. Sample size for a paired case-control study: PURPOSE: To determine the number of cases and controls required for a paired case-control study. DATA REQUIRED: An estimate of the population rate of the study characteristic, the smallest difference you wish to be able to detect, and the z(beta) and z(alpha) levels of certainty you require. EXAMPLE: Paired rats are fed a normal diet plus or minus a suspected carcinogen. How many rat pairs must be studied to detect a 1% increase in the population cancer rate of 3% , given that z(beta) = 90% and z(alpha) = 95% ? Answer: N = 3429 COMMENT: N = [(z(a)*SQR(pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT))) / (PT-pi)] squared REFERENCE: Colton, p. 161. C. Sample size for an unpaired case-control study: PURPOSE: To determine the number of cases and controls required for an unpaired case-control study. DATA REQUIRED: An estimate of the Control group rate (used as the population rate), whether the test group will be higher or lower than the controls, the smallest difference you wish to be able to detect, and the z(beta) and z(alpha) levels of certainty you require. EXAMPLE: How many case and control animals should be studied to determine if a new antibiotic cures cattle disease 10% better than current standard therapy? Current therapy cures 70% of animals. Let z(beta) = 90% and z(alpha) = 95%. Answer: 392 cases and 392 controls. COMMENT: [(z(a)*SQR(2*pi*(1-pi)) + |z(b)|*SQR(PT*(1-PT)+PC*(1-PC))] N = [-----------------------------------------------------------] squared (PT - PC) REFERENCE: Fleiss, p 41 and Schlesselman, p. 168. 17 (22) "SCATRGRM" * PURPOSE: To graph the relationship between paired variables according to user specifications on the high resolution graphics screen. To display the linear regression line. DATA REQUIRED: A DATA-ONE datafile containing two paired variables. The minimum and maximum values in each variable are displayed. You specify the labels and units to be printed on horizontal and vertical axes. Then enter an interval width for each variable. EXAMPLE: Graph the relationship between advertising expenditures and gross sales based on the last 10 years of experience at Company A. COMMENT: Be sure to pick an interval width that will result in 20 or fewer intervals on the vertical, and 60 or fewer intervals on the horizontal axis. To display the linear regression line press key F5. The formula for this regression line is displayed in LNREGRES (number 12 above). To obtain a printed copy on the IBM, Epson, Okidata or Prowriter (specified in "EPISTAT"), press key F1. Press key F10 to return to the program. (23) "SELECT" * PURPOSE: To select a subset of a datafile based on user specifications. Data can be selected for printing, or to create a new datafile on disk. DATA REQUIRED: A DATA-ONE datafile and knowledge of the selection criteria you want to apply. One can select on any variable with "AND" and "OR" specifications. As many as 10 selection criteria can be set at one time. SELECT assumes that "AND"s are in parentheses. For example: "SELECT IF Sample #1>10 AND Sample #2=1 OR Sample #1<Sample #3" is interpreted as meaning: "SELECT IF (Sample #1>10 AND Sample #2=1) OR Sample #1<Sample #3" EXAMPLE: You have a datafile containing all of the quality control results for a particular machine part this month. You want a new file created which contains only those parts that failed specifications. You may select all the samples that exceed quality criteria. 18 (24) "T-TEST" * A. Paired and unpaired T-test: PURPOSE: To determine if the means of two samples are statistically different. DATA REQUIRED: A DATA-ONE datafile with the two samples to be compared. If a paired test is being performed, both samples must contain the same number of items. EXAMPLE: Is the mean weight gain of a herd fed on new Brand X significantly greater than the weight gain of a second herd fed the standard brand feed? COMMENT: The means and variances of the two samples will be displayed, followed by the T value, degrees of freedom, and the p value. For the unpaired T-test, the equality of variances is tested to be sure that the assumptions of the T-test are met. If the variances are statistically different, the F value supporting that conclusion will be displayed. The confidence limits on the difference between the two values are also displayed. REFERENCE: Snedecor, p. 116. B. T value: PURPOSE: To evaluate the p value associated with a given T value. DATA REQUIRED: The T value and the degrees of freedom. (25) "XTAB" * PURPOSE: To crosstabulate data in 1,2 or 3-way reports. This provides the tabular couterpart of a scattergram. DATA REQUIRED: A DATA-ONE datafile containing at least as many variables as the number of ways you want to crosstabulate. The minimum and maximum values for each sample will be displayed and then you choose the interval width for each cell of the table. If you have coded data with sequential integers, choose a width of 1. If you have quantitative data, it is usually best to choose and interval that will result in fewer than 10 cells or the table will be difficult to read. In addition to choosing the interval, you are offerred the opportunity to label each row and column interval with the label of your choice to make a more readable report. EXAMPLE: What is the age by sex breakdown of hospitalized cases of meningitis? COMMENT: The crosstab report is printed on screen or printer. The number of missing values displayed is the number of cases where one or more of the samples involved contained a blank. 19 THE EXAMPLE DATAFILE An example datafile, named "EXAMPLE", showing a sample of people, their ages and their systolic blood pressures, is included on the EPISTAT disk. To gain some familiarity with the appearance of an EPISTAT datafile, follow these steps: 1.) Press <Ctrl> and <Alt> and <Del> at the same time (or load BASICA, then type RUN "EPISTAT") to run the introductory program. Do not change the default configuration for now, but move on to the main menu. 2.) Choose Menu option 3 to run specific programs in the EPISTAT package. 3.) Choose program number 2 to run "DATA-ONE", the main data entry and printing program in EPISTAT. 4.) Choose Menu option 6 to load data from disk. Then enter the filename EXAMPLE without any quotation marks. 5.) Return to the main DATA-ONE menu and choose option 4 to print this datafile on your screen or printer. Print it once in input order, then try printing it sorted by Sample 2 or 3. 6.) Choose menu option 7 to exit DATA-ONE ,then enter Y because EXAMPLE was already saved to disk. Choose other EPISTAT program numbers to run ANOVA, HISTOGRM, LNREGRES, SCATRGRM, or XTAB with this datafile. 7.) Return to DATA-ONE to enter your own data for analysis. 20 NOTICE --------------------------------------------------------------------- Users may copy EPISTAT and distribute it to others on the following conditions: 1. The programs are not modified in any way. 2. Individual programs are not distributed separately. 3. No fee is charged for copying or distribution. --------------------------------------------------------------------- ====USER-SUPPORTED SOFTWARE==== The concept of user-supported software is based on three principles: 1. The value and utility of a software package is best assessed by each user on his or her own system with his or her own data. Only after using a program can one determine whether it serves one's personal applications, needs, and tastes. 2. The creation of independent personal computer software requires a substantial commitment of time and effort. Rather than replicate this effort time after time, the computing community can and should support individual creative efforts. 3. By encouraging users to copy programs, rather than spending large sums on copy-protection, authors can supply quality software at reduced cost. Users will support useful programs. If after using EPISTAT, you find it of value, your contribution in any amount will be appreciated ( $25 suggested ). If you are interested in a more sophisticated statistical package, write or call about the new TRUE EPISTAT. Send contributions to: Tracy L. Gustafson, M.D. 2011 Cap Rock Circle Richardson, Texas 75080 214-680-1376 Thank you.